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Description 

Field of the Invention 

5 [0001] Tine present invention relates generaliy to speecli coding systems and more specifically to a reduction of 
bandwidth! requirements in analysis-by-syntliesis speecli coding systems. 

Bacltground of the Invention 

10 [0002] Speecli coding systems function to provide codeword representations of speech signals for communication 
over a channel or network to one or more system receivers. Each system receiver reconstructs speech signals from 
received codewords. The amount of codeword information communicated by a system in a given time period defines 
system bandwidth and affects the quality of speech reproduced by system receivers. 

[0003] Designers of speech coding systems often seek to provide high quality speech reproduction capability using 
'5 as little bandwidth as possible. However, requirements for high quality speech and low bandwidth may conflict and 
therefore present engineering trade-offs in a design process. This notwithstanding, speech coding techniques have 
been developed which provide acceptable speech quality at reduced channel bandwidths. Among these are analysis- 
by-synthesis speech coding techniques. 

[0004] With anaiysis-by-synthesis speech coding techniques, speech signals are coded through a waveform match- 
20 ing procedure. A candidate speech signal is synthesized from one or more parameters for comparison to an original 
speech signal to be encoded. By varying parameters, different synthesized candidate speech signals may be deter- 
mined. The parameters of the closest matching candidate speech signal may then be used to represent the original 
speech signal. 

[0005] Many analysis-by-synthesis coders, e.g., most code-excited linear prediction (CELP) coders, employ a long- 
2S term predictor {UP) to model long-term correlations in speech signals. (The term "speech signals" means actual speech 
or any of the residual and excitation signals present in analysis-by-synthesis coders.) During the synthesis process, 
an LTP is conventionally realized either as an all-pole filter or as an adaptive codebook with gain scaling. As a general 
matter, long-term correlations in speech signals allow a past reconstructed speech signal to serve as an approximation 
of a current speech signal. LTPs work to compare several past speech signals (which have already been coded) to a 
30 current (original) speech signal. By such comparisons, the LTP determines which past signal most closely matches 
the original signal. A past speech signal is identifiable by a de/ay which indicates how far in the past (from current time) 
the signal is found. A coder employing an LTP subtracts a scaled version of the closest matching past speech signal 
{i.e., the best approximation) from the current speech signal to yield a signal with reduced long-term correlation. This 
signal is then coded, typically with a fixed stochastic codebook (FSCB). The FSCB index and LTP delay among other 
3S parameters, are transmitted to a CELP decoder which can recover an estimate of the original speech from these 
parameters. 

[0006] By modeling long-term correlations of speech, the quality of reconstructed speech at a decoder may be en- 
hanced. This enhancement, however, is not achieved without a significant increase in bandwidth. For example, in order 
to model long-term correlations in speech, conventional CELP coders may transmit B-bit delay information every 5 or 
40 7.5 ms (referred to as a subframe). Such time-varying delay parameters require, e.g., between one and two additional 
kilobits (kb) per second of bandwidth. Because variations in LTP delay may not be predictable overtime (i.e., a sequence 
of LTP delay values may be stochastic in nature), it may prove difficult to reduce the additional bandwidth requirement 
through improved coding of delay parameters. 

[0007] One approach to reducing the extra bandwidth requirements of analysis-by-synthesis coders employing an 
4S LTP might be to transmit LTP delay values less often and determine intermediate LTP delay values by interpolation. 
However, interpolation may lead to suboptimal delay values being used by the LTP in individual subframes of the 

speech signal. For example, if the delay is suboptimal, then the LTP will map past speech signals into the present in 
a suboptimal fashion. As a result, the difference between past speech mapped into the present and the original signal 
will be larger than it might otherwise be. The FSCB must then work to undo the effects of this suboptimal time-shift 

so rather than perform its normal function of refining waveform shape. As a result, significant audible distortion may result. 
[0008] EP-A-500 961 discloses a voice coding system which finds by evaluation operation a code vector that mini- 
mizes the error between an input voice signal and a reproduced signal obtained through a linear estimation synthesis 
filtering simulating the vocal tract characteristics for each of the code vectors successively read out from a code book 
that stores a plurality of noise sequences as code vectors, and then encodes the input voice signal by using a code 

55 which specifies a code vector. The code bock is constituted as a delta vector code book that stores the initial vector 
and a plurality of delta vectors which consist of differential vectors among the neighboring code vectors. Operation 
means for the evaluation operation is provided with a cyclic adder means which accumulates delta vectors for virtual 
reproduction of said code vectors. 
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Summary of the Invention 



[0009] According to the invention there is provided a method and apparatus as set out in independent claims 1 and 
1 5. Preferred forms of the invention are set out in the dependent claims. 

5 [0010] The present invention provides a method and apparatus for reducing bandwidth requirements in analysis-by- 
synthesis coding systems. In accordance with the present invention, generalized analysis-by-synthesis codmg is pro- 
vided through variation of original signals. Original signal variants are referred to as trial original signals. Use of trial 
original signals In place of or as a supplement to the use of original signals in analysis-by-synthesis coding reduces 
coding error and bit rate requirements. In the context of speech coding, reduced coding error affords less frequent 

10 transmission of LTP delay information and allows for delay interpolation with little or no degradation in the quality of 
reconstructed speech. The invention is applicable to, among other things, networks for communicating speech infor- 
mation, such as, for example, wireless lo.g.., cellular) and conventional telephone networks. 

[0011] Regarding speech coding, trial original signals are illustratively signals which are perceptually {e.g., audibly) 
similar to the actual original signal. The degree of audible similarity between a trial original signal and the actual original 

'5 signal may affect coded bit rate and the quality of speech synthesized by a receiver {e.g... the lower the similarity, the 
lower the bit rate and speech quality may be). The original signal, and hence the trial original signals, may take the 
form of actual speech signals or any of the residual or excitation signals present in analysis-by-synthesis coders. 
[0012] In an illustrative embodiment of the present invention, trial original signals are generated as time-shifted ver- 
sions of an original speech signal segment. Measures of similarity (e.g., cross-correlations) between trial original signals 

20 and contributions from an adaptive codebook are evaluate A trial original signal which is either the same as one of the 
trial original signals or a variant of an original or trial original signal is determined based on one or more evaluated 
measures of similarity (In the case of a variant of previously generated trial original signals, the determined trial original 
signal {i.e., the variant) may correspond to a time-shift which falls in between time-shifts which produced previously 
generated trial original signals.) A signal reflecting a coded representation of the original signal is generated based on 

2S the determined trial original signal. 



Brief Description of the Drawings 



[0013] Figure 1 presents a conventional CELP coder. 

[0014] Figure 2 presents an illustrative embodiment of the present invention. 

[0015] Figure 3 presents windows of samples used in a correlation process estimating open-loop delay 
[0016] Figure 4 presents illustrative time relationships of delay values for use with the embodiment of Figure 2. 
[0017] Figure 5 presents an illustrative embodiment of an adaptive codebook processor. 

[0018] Figures 6a-c present illustrative sample time relationships for operation of the adaptive codebook of the em- 
bodiment of Figure 2. 

[0019] Figure 7 presents an illustrative embodiment of the time-shift processor of the embodiment of Figure 2. 
[0020] Figure 8 presents an illustrative set of initial conditions for the operation of the time-shift processor of Figure 7. 
[0021] Figure 9 presents a flow diagram of the operation of the time-shift processor of Figure 7. 
[0022] Figure 1 0 presents an illustrative segment of original speech used for generating trial original speech signals 

by time-shifting. 

[0023] Figure 1 1 presents an alternative embodiment of the invention. 

[0024] Figure 12 presents a finite state machine describing the operation of a delay estimator as it concerns time 
synchrony between original and time-shifted signals. 

[0025] Figure 1 3 presents an illustrative receiver/decoder for use with the illustrative coder embodiments presented 
in Figure 2 and in Figure 11 . 



Detailed Description 



Illustrative Embodiment Hardware 

so 

[0026] For clarity of explanation, the illustrative embodiment of the present invention is presented as comprising 
individual functional blocks (including functional blocks labeled as "processors"). The functions these blocks represent 
may be realized through the use of either shared or dedicated hardware, including, but not limited to, hardware capable 
of executing software. For example, the functions of processors presented in Figures 5 and 7 may be provided by a 
55 single shared processor. (Use of the term "processor" should not be construed to refer exclusively to hardware capable 
of executing software.) 

[0027] I llustrative embodiments of the present invention may comprise digital signal processor (DSP) hardware, such 
as the AT&T DSP16 or DSP32C, read-only memory (ROM) for storing software performing the operations discussed 
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below, and random access memory (RAM) for storing DSP results. Very large scale integration (VLSI) hardware em- 
bodiments, as well as custom VLSI circuitry in combination with a general purpose DSP circuit, may also be provided. 

Introduction to Conventional CELP 

5 

[0028] A conventional analysis-by-synthesis CELP coder is presented in Figure 1 . A sampled speech signal, s(i), 
(where / is the sample index) is provided to a short-term linear prediction filter (STP) 20 of order N, optimized for a 
current segment of speech. Signal x(i) is an excitation obtained after filtering with the STP: 

10 

N 

xH) = J(i) - 2 a„ sH-n), (1) 



IS 

where parameters a„ are provided by linear prediction analyzer 1 0. Since N is usually about 1 0 samples (for an 8 kHz 
sampling rate), the excitation signal x(i) generally retains the long-term periodicity of the original signal, s(i). An LTP 
30 is provided to remove this redundancy 

[0029] Values for x(i) are usually determined on a blockwise basis. Each block is referred to as a subframe. The 
20 linear prediction coefficients, a„ are determined by the analyzer 1 0 on a frame-by-frame basis, with a frame having a 
fixed duration which is generally an integral multiple of subframe durations, and usually 20-30 ms in length. Subframe 
values for a„ are usually determined through interpolation. 

[0030] The LTP, typically implemented with an adaptive codebook, determines a gain X(/) and a delay d(i) for use as 
follows: 

2S 

r(i) = x(i) - X{i) ^(i-d(i)), (2) 

where the x{i- d{i)) are samples of a speech signal synthesized (or reconstructed) in earlier subframes. Thus, the LTP 

30 30 provides the quantity X{i) x{i - d(i)). Signal r(i) is the excitation signal remaining after X{t) x(i - d(i)) is subtracted from 
x(i). Signal r(i) is then coded with a FSCB 40. The FSCB 40 yields an index indicating the codebook vector and an 
associated scaling factor, Together these quantities provide an excitation which most closely matches r(i). 
[0031] Data representative of each subframe of speech, namely, LTP parameters and d{l).. and the FSCB index, 
are collected for the integer number of subframes equalling a frame (typically 2 to 8). Together with the coefficients a„, 

3S this frame of data is communicated to a CELP decoder where it is used in the reconstruction of speech. 

[0032] A CELP decoder performs the reverse of the coding process discussed above. The FSCB index is received 
by a FSCB of the receiver (sometimes referred to as a synthesizer) and the associated vector e(i) (an excitation signal) 
is retrieved from the codebook. Excitation e(i) is used to excite an inverse LTP process (wherein long-term correlations 
are provided) to yield a quantized equivalent of x(i), x[i). A reconstructed speech signal, y(i), is obtained by filtering x 

40 (/) with an inverse STP process (wherein short-term correlations are provided). 

[0033] In general, the reconstructed excitation x(/) can be interpreted as the sum of scaled contributions from the 
adaptive and fixed codebooks. To select the vectors from these codebooks, a perceptually relevant error criterion may 
be used. This can be done by taking advantage of the spectral masking existing in the human auditory system. Thus, 
instead of using the difference between the original and reconstructed speech signals, this error criterion considers 

4S the difference of perceptually weighted signals. 

[0034] The perceptual weighting of signals deemphasizes the formants present in speech. In this example, the form- 
ants are described by an all-pole filter in which spectral deemphasis can be obtained by moving the poles inward. This 
is equivalent to replacing the filter with predictor coefficients a^, ag, a^/, by a filter with coefficients ya-i, '^32^ — , 
y'^ayy, where y is a perceptual weighting factor (usually set to a value around 0.8). 

so [0035] The sampled error signal in the perceptually weighted domain, g(i)., is: 



N 

gH) = xiO - Hi) + Z Vfli. *(»■-«) (3) 



The error criterion of analysis-by-synthesis coders is formulated on a subframe-by-subframe basis. For a subframe 
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length of L samples, a commonly used criterion is: 



i+£.-l 



where ;"is the first sample of the subframe. Note that this criterion weighs the excitation samples unevenly over the 
subframe; the sample x{T+ L - 1) affects only g(hL - 1), while x(/) affects all samples of in the present subframe. 
[0036] The criterion of equation (4) includes the effects of differences in x(i) and x{f) prior to /, i.e. , prior to the beginn ing 
of the present subframe. It is convenient to define an excitation in the present subframe to represent this zero-input 
response of the weighted synthesis filter: 



0, /■ < /. 

n = l 

0, I > i+N 



(5) 



where z(i) is the zero-input response in the present subframe of the perceptually-weighted synthesis filter when excited 
with x{i]-x{i) prior to the present subframe. 

[0037] In the time-domain, the spectral deemphasis by the factor y results in a quicker attenuation of the impulse 
response of the all-pole filter. In practice, for a sampling rate of 8 kHz, and 7= 0. 8, the impulse response never has a 
significant part of its energy beyond 20 samples. 

[0038] Because of its fast decay, the impulse response of the all-pole filter 1/(1 - ya.^r'^ ••• - y'^a/^'^ can be approx- 
imated by a finite-impulse-response filter. Let /^j, , ft^.^ denote the impulse response of the latter filter. This allows 
vector notation for the error criterion operating on the perceptually-weighted speech. Because the coders operate on 
a subframe-by-subframe basis, it is convenient to define vectors with the length of the subframe in samples, L For 
example, for the excitation signal: 



-«(«)= [x(j) xd-l-l) •iO+L-l)] • 



(6) 



Further, the spectral-weighting matrix His defined as: 
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0 




IS 



Hhas dimensions {L +R- 1)XL Thus, tine vector Hx{f) approximates tine entire response of tlie IIR filter 1/(1 - ya^z"'' 
— - "^ayyZ^ to tlie vector x{!). Willi tliese definitions an appropriate perceptually-weighted criterion is: 



With the current definition of Hthe error criterion of equation (8) is of the autocorrelation type (note that H^HIs Toeplitz). 
If the matrix His truncated to be square LXL, equation (8) equals equation (4), which is the more common covariance 
criterion, as used in the original CELP. 

An Illustrative Embodiment for CELP Coding 

[0039] Figure 2 presents an illustrative embodiment of the present invention as it may be applied to CELP coding. 
A speech signal In digital form, s(/), is presented for coding. Signal s(/) is provided to a conventional linear predictive 
3S analyzer 100 which produces linear predictive coefficients, a„. Signal s(/) Is also provided to a conventional linear 
prediction filter (or "short-term predictor" (STP)) 120, which operates according to a process described by Eq. (1 ), and 

to a conventional delay estimator 140. 

[0040] Delay estimator 1 40 operates to provide an estimated delay value to the adaptive codebook processor 1 50. 
To determine delay Information valid at a particular sample time, delay estimator 1 40 performs conventional correlation 
40 of a window of samples of s(i), centered about the particular sample in question, with each of a multiplicity of windows 
of the same length. The windows Involved in this correlation are Illustrated in Figure 3. 

[0041] Figure 3 presents the demarcations for frames (F) and constituent subframes (SF) of sampies of signal s(/) 
(actual sampie vaiues of s(/) have been omitted for clarity). Shown are three frames, F^^ (the past frame), F„ (the 
currenf frame, and F^i(the nexf frame). Each of these frames comprises 160 samples of signal s(l). 

4S [0042] The location of frame boundaries is provided by time shift processor 200 discussed below. Time shift processor 
200 provides a sample location dp V indicating the end of a subframe of original speech signal, s(/). Delay estimator 
140 simply keeps track of the subframe boundaries of original speech to know when a frame boundary Is reached 
(such a frame boundary Is at an Integral multiple of subframe boundaries). Because delay estimator 1 40 operates on 
a frame of speech prior to the operation of the time shift processor 200 on the same frame of speech, delay estimator 

so 1 40 must predict the position of future frame boundaries. It does this by adding a fixed number of samples equal to a 
frame length {e.g., 160 samples) to the last frame boundary provided by the time shift processor 200. 
[0043] Assume delay estimator 1 40 is to determine a value for delay, M, valid at the boundary between the current 
and next frames of s(;), M(FB^^). To do this, estimator 140 stores in Its memory a window of 160 signal sampies 
surrounding this boundary (estimator 140 must wait to receive samples of signal s(i] valid in the next frame). This 

55 window of sampies is denoted as window A Next, estimator 140 performs a correlation computation with samples of 
s(/) In window B-^ - the first of 140 other windows of s(/). Windows, is a window of 1 60 samples beginning 20 samples 
earlier than the beginning of window A and ending 20 samples earlier than the end of window A A correlation value 
associated with window 6, is stored In memory. The correlation process is repeated with window a 160 sample 



20 




(8) 



2S 
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window beginning one sample earlier in time than window S^. Correlation computations are performed for each of the 
next 1 38 windows, each window distinct from the one before by one sample. 

[0044] As shown in Figure 3, estimator 140 must have enough memory to store what is essentially two frames of 
signal samples. If D is the largest delay value allowed, then the memory should extend Dsamples prior to the beginning 
5 of window /A. When D = 1 60, in order to compute an estimated delay valid at FB^^ , estimator 1 40 must store samples 
of s(/) from the beginning of the third subframe, SF2, of frame F^^ to the end of the second subframe, SF^, of frame 
F^^- Delay, M, Is determined by estimator 140 based on the 6 window of samples having the greatest correlation with 
the samples of window A. That is, delay is equal to the number of samples that the most correlated S window is shifted 
in time from window A. 

10 [0045] The delay estimator 140 determines a frame boundary delay estimate, M, once per frame. Delay estimator 
140 further determines a delay value, m., valid at a fixed number of samples into each subframe {e.g., 10 samples), by 
conventional linear interpolation of delay values valid at frame boundaries. For this purpose, the delay value required 
at 10 samples into the next frame is set equal to the delay value at the frame boundary 

[0046] The timing associated with the delay values provided by delay estimator 140 is illustrated in Figure 4. As 
'5 shown in the Figure, delay values valid at the frame boundaries surrounding frame n are M{FB„) and M{FB^^ ). Delay 
values valid at a fixed number of samples past each subframe boundary {SB) within frame n are indicated as 
/c=0,1 ,2,3. These values of m^K) are determined by Interpolation as discussed above. Delay values m„ {K} are provided 
to the adaptive codebook processor 1 50. As will be discussed below, the adaptive codebook processor 1 50 uses this 
delay information to provide an adaptive codebook contribution to the time shift processor 200. 

20 

The Adaptive Codebook Processor 

[0047] The adaptive codebook processor 150 provides an estimate of a current subframe of speech (to be coded) 
to the time shift processor 200 based on delay estimates, m„(/<), from the delay estimator 1 40 and past reconstructed 

25 speech signals from the CELP process. The adaptive codebook processor 1 50 operates by using delay values, m„(K}, 
to determine a delay pointer to past reconstructed speech signals stored in the memory of processor 1 50. Selected 
past speech samples, x(/), are then provided to processor 200 as an estimate of the current subframe of speech to be 
coded. For each subframe of original speech to be coded, adaptive codebook processor 1 50 provides a corresponding 
subframe of speech samples plus a fixed number of extra samples which extend Into the ne^rt subframe. Illustratively 

30 this fixed number of extra samples equals 10. 

[0048] Figure 5 presents an illustrative realization of the adaptive codebook processor 1 50. The realization comprises 
processor 1 55 and RAM 1 57. Processor 1 55 receives past reconstructed speech signals, x[l), and stores them in RAM 
157 for use in computing current and nexf subframe speech samples. Processor 155 also receives delay values, m„ 
{K}, from delay estimator 140 which are used in the computation of such sample values. Processor 155 provides such 

3S computed sample values, x(;), to time shift processor 200 for use in the generation of trial original signals. 

[0049] Each sample of speech providedtothe time shift processor 200 is determined as follows. First, a delay pointer 
d{i), valid for the sample in question (that is, the sample to be provided to the time shift processor 200) is determined 
by processor 155. This is done by interpolating between a pair of delay values, m„(k) (provided by delay estimator 
140), which surround the sample in question. The interpolation procedure used by processor 155 to provide the delay 

40 pointers, d{i), is conventional linear interpolation of the provided delay values, m^(k). Next, processor 155 uses the 
delay pointer, d(i) (valid for the sample in question), as a pointer backward in time to an ear/Zerspeech sample which 
Is to be used in the current frame as the value of the sample in question. Such earlier samples are stored in RAM 157. 
In general, the delay pointer, d((), willnot point exactly to a past sample. Instead, d{i) will likely point somewhere between 
consecutive past samples. Under such circumstances, processor 155 interpolates past samples to determine a past 

4S sample value valid at the moment in time to which the delay pointer refers. The interpolation technique used by proc- 
essor 1 55 to determine past sample values is conventional bandlimited interpolation, such as that described by Rabiner 
and Schafer, Digital Processing of Speech Signals, pp. 26-31 (1 978). The interpolation filter realized by processor 1 55 
illustratively employs 20 taps on either side of the past sample closest to the time indicated by the delay value. 
[0050] Figures 6a-c illustrate the process by which the adaptive codebook processor 150 selects past samples for 

so use in a current (and next) subframe. For clarity of presentation. Figures 6a-c assume that a computed value of d{i} 
points exactly to a past sample value, rather to a point in between past sample values. Also, it will be assumed without 
loss of generality that the delay values are shorter than the subframe length. 

[0051] As shown in Figure 6a, the samples to be provided to time shift processor 200 include samples in a current 
subframe and a fixed number of samples In the next subframe. Processor 155 receives a delay value for the current 
55 subframe, m^^^ from the delay estimator 1 40 and has stored In Its memory 1 57 a delay value for the previous subframe, 
'"prev. To determine the value of each sample, x(ij, of the current subframe located prior to the point at which m^^j^^ is 
valid, processor 155 determines a delay pointer, d(/), valid at the sample time /of the sample in question. This is done 
by linear interpolation to the point in time when the sample is valid using delay m^j^^ and the last delay value received 
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from estimator 1 40, rripi-gy. After this delay pointer, d{i), has been determined, processor 1 55 computes by bandlimited 
interpolation of samples in its memory 157 the sample value valid at a point in time which is cl(i) samples prior to the 
sample in question, i.e., x{i-d(l)). This sample value is then inserted into a memory location reserved for the current 
subframe sample in question. 

5 [0052] In the example of Figure 6, the subframe length is longerthan the delay values. The process by which a given 
sample in the current subframe is determined is based on determining a delay pointer and looking backward in time 
for a sample value to use as the given sample value. Thus, segments of reconstructed speech may be essentially 
repeated using bandlimited Interpolation within the current subframe. So, for example, in Figure 6b, a given sample, 
x(/), takes its value from a previously determined sample which precedes it in time by a delay I.e., x(/-d(/)). This 

10 delay is determined as described above, except the delay values which are interpolated are the delays from the current 
subframe, m^^^^ and the next subframe, m^g^f, since these delays surround sample x(;). Repeating signal segments 
with constant gain when the delay is shorter than the subframe length is what distinguishes the adaptive codebook 
procedure from LTP filtering procedures. 

[0053] As shown in Figure 6c, the extra samples in the nexf subframe are determined in the same fashion as those 
TS In Figure 6b. In this case, samples from the current subframe are used to provide values for samples In the next 
subframe. 

[0054] In practice, the above-described procedure of the adaptive codebook processor 150 may be realized by first 
computing all delay pointer values, d{l) for all sample times of the current and portion of the next subframe in question. 
Then, for each sample time, /, of the present or next subframe needing a sample value, d(i) is used as a reference to 
20 a past time, i-d(i), at which a sample is "located." In general, there will not be a sample located at time i-d{i). Therefore, 
bandlimited Interpolation of samples surrounding time i-d{i) will he required. Once the bandlimited interpolation is per- 
formed generating a sample value at /-o'(/), that sample value is assigned to time /. This process may be repeated In 
a recursive process for each sample in the present or next subframe as needed. 

[0055] Once the adaptive codebook processor 150 has determined samples for use in the current subframe and a 
2S fixed portion of the next subframe, those samples are provided to the time shift processor 200 for use as a basis for 
determining a shifted original signal for use in a CELP coding process. The samples provided to the time shift processor 
are referred to as the adaptive codebool< contribution Xo the analysis-by-synthesis process of CELP coding. 
[0056] It should be understood that an all-pole filter may be used in place of the adaptive codebook realization of an 
LTP. However, the adaptive codebook realization is particularly well suited to situations where, as illustrated here, delay 
30 values are generally less than the length of a subframe. This is because a adaptive codebook realization does not 
require a determined value of LTP gain (here, codebook gain) simply to provide an LTP contribution in the current 
subframe. This gain may be determined later. Unlike the case of the adaptive codebook, an all-pole filter realization of 
an LTP requires the solution of a nonlinear equation to obtain a value for the filter gain when delay is less than subframe 
length. 

35 

The Time-Shift Processor 

[0057] The time shift processor 200 determines how to shift segments of an original speech signal such that it may 
be coded (by an analysis-by-synthesis coding process, such as CELP) with less error than if the original signal was 

40 always used for coding. To time-shift an original speech signal, the time shift processor 200 first identifies within the 
original speech signal a local maximum of original speech signal energy In the illustrative embodiment described 
below, processor 200 selects a plurality of overlapping segments of the original speech signal, each of which includes 
the identified local maximum signal energy. Processor 200 compares each selected segment with a segment of the 
adaptive codebook contribution (provided by the adaptive code book processor 150). This comparison is made to 

45 determine the original speech signal segment which most closely matches the segment of the adaptive codebook 
contribution. When the segment of the original speech signal which best matches the segment of the adaptive codebook 
contribution is determined, this segment of original speech is used in the formation of a shifted original speech signal 
for coding by a CELP process. 

[0058] As shown in Figure 2, the time shift processor 200 receives an original residual speech signal, x(/), from the 
so SIP 120, and provides a time shifted residual, x(i), for use in the CELP coding process. As shown in Figure 7, time 
shift processor 200 illustratively comprises processor 210; conventional buffer memories 220, 230, and 240; conven- 
tional ROM 250 for the storage of processor 210 programs; and conventional RAM 260 for the storage of processor 

210 results. 

[0059] The operation of time shift processor 200 will be explained with reference to Figure 8, which presents an 
55 illustrative starting point for processor 200 operation on speech signals, and Figure 9, which presents an illustrative 
flow diagram for the operation of processor 210. 

[0060] As shown in Figure 8, processor 200 begins operation having received a buffer 220 of reconstructed speech 
representing the adaptive codebook contribution from the adaptive codebook processor 1 50. As discussed above, this 
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adaptive codebook contribution comprises samples of past reconstructed speecfi which have been mapped into the 
current subf rame and a fixed portion of the next subf rame (see Figure 6 and associated discussion) by processor 1 50. 
This buffer of reconstructed speech is loaded into RAM 260 for use by processor 210. A pointer, dp 1, is maintained 
by processor 21 0 and stored in RAM 260 to indicate the end of the latest subf rame for which both the adaptive codebook 

5 and FSCB contributions have been determined. The length of such subf rames, subframej, /s constant and maintained 
in memory, e.g., ROM 250. Based on prior operation of the processor 210, a time shifted residual, jf(i), has been created 
up to a point in time identified by a pointer dpm (pointer dpm is always greater than or equal to dp 1 ). Moreover, a 
portion of the original residual signal, x{i}, including that associated with the current subframe, has been received by 
buffer 230 and stored in RAM 260. Processor 210 maintains (in RAM 260) a value, acc_shift, representing the sample 

10 displacement (or accumu/ateofs/j/ftj between the last sample in the shifted residual signal and a corresponding sample 
in the original residual speech signal. (At initialization, the above-described status is modified to include dpm= dp 1 
and acc_shift = 0). 

[0061] Given this set of conditions, the time shift processor 200 operates to determine a shifted residual signal for 
the cu rrent subf rame (and possibly a portion of the next subframe, depending on the circumstances) which best matches 

'5 the adaptive codebook contribution. 

[0062] Figure 9 presents a flow-diagram illustrating the operation of the processor 210 of Figure 7. According to 
Figure 9, the first task performed by processor 210 Is to determine whether the time shifted resldual,i((i), has been 
extended up to or beyond the end of the current subframe. As shown in Figure 8, the extent to which the time shifted 
residual has been extended is given by pointer dpm. The end of the current subframe is indicated by the sum of current 

20 subframe pointer dp 1 and the fixed subframe length, subframej. If dpm <dp^ + subframe_/ further processing is 
performed to extend the shifted residual; else, no further shift processing is required for the current subframe (see step 
305). 

[0063] If further shift processing is required, processor 21 0 determines the location of maximum energy in a segment 
of the original residual speech signal, x{i) ("see steps 310-375), Ordinarily, the location of maximum energy corresponds 

25 to the location of a pitch-pulse of voiced speech. However this is not necessarily the case. Regardless of whether the 
maximum energy is associated with a pitch-pulse or some other signal feature (such as, e.g., energetic noise), the 
search for the maximum energy location is made so that shifts in the original signal will be made to best align an 
energetically significant feature in the original speech with a significant feature in the adaptive codebook contribution. 
[0064] The beginning of the segment of the original residual speech signal to be searched is defined with respect to 

30 a pointer to an original residual speech signal sample. This sample corresponds to the sample identified by pointer 
dpm in the shifted residual signal. This residual speech signal sample pointer, dpm', is determined as the sum of sample 
pointer dpm and the accumulated shift between x (i) and x{i): dpni=dpm+acc_shift (see step 31 0). The beginning of 
the inten/al to be searched, designated by the pointer offset, is then computed (see step 315). Next, the length of the 
inten/al to be searched is defined (see step 320). 

3S [0065] The location of maximum energy in the segment of x{i) is then determined (see step 325). This determination 
is made with use of a five-sample window. This window, centered about the /th sample of the original residual speech 
signal, defines samples of the original residual used in an energy computation. The energy at sample location / is 
determined by the sum of the squares of the samples in the window. The energy at the (/ + 1)th sample location is 
determined in the same fashion, but with the window moved one sample later in time such that the center window 

40 location now contains the (; + 1 )th sample. Again, the energy is determined as the sum of the squares of the sample 
values in the window. The energy of each sample location in the segment is determined in the same fashion. The 
energy of samples in a curref7f window may be determined as the energy of an immediate pasf window of samples 
minus the energy of the sample shifted out of the window plus the energy of the sample shifted into the window. The 
sample location having associated with it the maximum energy determined in this fashion is identified by a pointer 

4S locatbn. 

[0066] Once the segment of the original residual signal, x(/), has been searched for the sample having the maximum 
energy in the segment, processor 210 determines if this maximum energy sample is one which has been considered 
in the previous subframe (and thus not a maximum of interest). This is done by determining whether tocaf/on precedes 
dpm' (see step 330). 

so [0067] If tocaf/on precedes dpm', another search is performed by processor 210. In this case, however, the segment 
searched begins at a sample specified as offset = location + 0,75 delay {see step 335), and is of duration 0. 5 delay. 
The value delay \s provided by delay estimator 140 as the delay valid at the beginning of the current subframe, M(FB„). 
Since significant pitch-pulse energy features in the original residual signal are likely separated by one delay period, 
the computation of a new offset allows the search to skip ahead (0.75 dela]/} and likely find a maximum energy feature 

55 within a segment of length 0.5 delay. The sample location of maximum energy is determined as described above with 
reference to step 325 (see step 345). 

[0068] If location does not proceed dpnf, then the first pitch-pulse beyond dpni has likely been found, and the flow 
of control jumps to step 350. 
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[0069] If the location of maximum signal energy determined at either steps 325 or 345 follows dpirt + delay, then it 
is likely, but not certain, that a pitch-pulse located subsequent to dpirt but prior to dpnt + delay has been missed by 
the searches performed to this time by processor 210 (see step 350). In this case, another segment of the original 
residual signal is defined and the location of the maximum energy therein is determined. If the location of maximum 

5 signal energy determined at either steps 325 or 345 precedes dpirt + delay, then the flow of control jumps to step 380. 
[0070] Assuming step 350 results in the need to search another segment of the original residual speech signal, this 
segment is determined to begin at offsets location - 1.25 c/e/aj^see step 355) and extend for lengths 0.5 delay {see 
step 350). The location of the maximum energy is determined as described above with reference to step 325, but the 
sample pointer to this location is saved as location2 {see step 365). 

10 [0071] If the location of maximum energy {location 2) is subsequent to dpiri, then location 2 identifies the location 
of the first pitch-pulse beyond dpm', and location is set equal to locatiohZ (see steps 370 and 375). If, on the other 
hand, the location of maximum energy is not beyond dpirl, then location2 is not the first pitch-pulse beyond dpm', and 
location remains set to the value it was assigned at either step 325 or 345 (since under such circumstances, pointer 
location is not overwritten by the operation of step 365). 

'5 [0072] At this point; the location of the first pitch-pulse (or energy maximum) in a segment of the original residual 
has been found. Now, a segment of the original residual signal containing this location will be defined by processor 
210 through the setting of certain pointers to samples in the signal. These pointers specify the beginning (s f start) and 
end (s fend) of this segment containing the determined location. This segment is defined for later use as part of the 
process of aligning (or shifting) original residual speech to best match an adaptive codebook contribution. 

20 [0073] First, default values for the segment pointers are set by processor 210. Pointer s f start is set equal to dpni. 
the sample location corresponding to dpm + acc_shift {see sXep 380). This valuefors fsfarf corresponds to an additional 
accumulated shift between x{i) and5(/) of zero. That is, use of a section of x{i) beginning at dpni {= s f start) adds 
nothing to the accumulated shift between the original and shifted residual signals. 

[0074] Pointer s f end is set to location + extra. The value exfra is a constant stored in memory (e.g. , ROM 250) and 
2S is equal to a fixed number of samples, e.g., 10 samples. Use of extra guarantees that the pitch-pulse (or maximum 
energy) of original residual speech will A70f fall at the end of the segment of the original residual being identified by 
these pointers (see stop 380). 

[0075] The default value of pointer s fend may be overwritten under certain circumstances. If the default value of s 
fend would mean that the segment of original residual speech would extend sign If Icantly beyond the end of the adaptive 

30 codebook contribution, the pointer s fenof is set to end at dp 1' + subframej + extra, where subframej is a constant 
equalling the number of samples in a fixed adaptive codebook subf rame as discussed above (see steps 385 and 390). 
[0076] The value of s f end may be further overwritten if the location of the identified pitch-pulse (or major energy) 
is significantly beyond the end of the adaptive codebook subframe. Under such circumstances the segment is deemed 
to end at the end of the adaptive codebook subframe boundary (see steps 395 and 400). Note that such a definition 

3S of s fend means that the location of the pitch-pulse (or major energy) Is later than the end of the segment. Therefore, 
the segment no longer contains the pitch-pulse. 

[0077] At this point, the location of the identified pitch-pulse (or maximum energy) is checked to determine whether 
It falls outside a range of samples beginning at s / start and ending at s f end - 1 (see steps 405). If so,i\{i) may be 
extended with samples obtained with bandlimlted interpolation of x(/) without need for changing acc_shift (that is, flow 

40 of control may jump to step 480). Otherwise, shifting is performed (see step 41 0-475). 

[0078] Assuming the location of the identified pitch-pulse (or major energy) is rof outside the range defined above, 
a set (or segment) of Z. samples of x(/) (within a specified range of samples about the segment defined by s f sfartand 
s / end) which mosf closely matches an /.-length section of the adaptive codebook contribution (which begins at dpm 
and ends at dprrn-L) is determined by processor 210. 

4S [0079] This L-length segment of x{i) may comprise those L samples of the segment of x(i) defined by s f start and s 
fend, but may also comprise samples (obtained by bandlimited interpolation) of a segment which is shifted with respect 
to s f start and s f end, depending upon how closely a given L-length segment of x{i) matches the L-length section of 
x{i). As predicates to this determination, a limit on the range of possible sample shifts (see step 410) and a sample 
length, L, are determined (see step 41 5). The determination of the "closeness" (/'a, a measure of similarity) between 

so L-length segments of x{i} and the adaptive codebook contribution x{i\ is made through a cross-correlation process of 
these signals (see step 425) (it will be understood that other measures of similarity, such as a difference or error signal 
may also be used). The selection of L-length segments of x(/) for use in a cross-correlation with a segment of x(/) may 
be advantageously described with reference to Figure 10. 

[0080] Figure 10 presents an illustrative segment of original residual speech signal x(/) which was located as de- 
55 scribed previously with reference to steps 310-400. The segment begins at sample s f start and ends at sample s f 
end. The pitch-pulse is at sample location, with the distance between samples location and s / end equal to exfra. As 
discussed above, the samples of x{i) falling within the segment defined by pointers s fsfart and s f end correspond to 
a shift of zero. Shifted segments of x(/) are defined with respect to this zero shift position. Each shifted segment is of 
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length L and begins (and ends) a certain positive or negative number of sample lengths (or fractions of sample lengths) 
with respect to the zero shift position. Expressed another way, each shifted segment begins at s f start + shift and ends 
at s fend + shift. As shown in Figure 10, the range of possible shifts values for sliift Is ±limit 

[0081] So for example, one possible shift would be sliift= - iimit. In this case, the L-length segment of x{i) defined 
5 by such a shift would begin at location sf start - limit and end at location s fend -iimit. Similarly, another possible shift 
would be sliift= + iimit. In this case, the L-length segment of x{i) defined by such a shift would begin at location s f 
start + //'m/f and end at location s fend + iimit. As mentioned above, +///T7;'f specifies a range of possible shifts. Therefore, 
shift may take on values in the range - iimit<shift<+iimit, given a shift step size {i.e., shift precision) of sstep. Step size 
sstep may be set illustratively to 0.5 samples. Sample values resulting from fractional shifts are determined by con- 
to ventional bandlimited interpolation. A plurality of 2Xlimit/sstep segments of the original residual signal x(/) may be 
defined in this way. All are /.-length segments between +limlt, wherein each segment overlaps its neighbor segments 
and is distinct from its nearest neighbor segments by ssfep samples. 

[0082] The relative sizes of limit and extra have an effect on system performance. For example, as extra is made 
larger, greater coding delay is introduced to the system. As extra is made smaller coding delay is reduced, but the 
'5 probability that shift will take on a value which excludes a pitch-pulse from the L-length segment of x{i) Increases. This 
exclusion, when it occurs, causes audible distortion in the speech signal. The probability of exclusion Is also Increased 
as limit Is made larger. To help insure that exclusion does not occur, the value of //m/f should be less than the value of 
extra. For example, if the value of extra Is 10, iimit may be set to 6. 

[0083] For each such L-length segment of x(i) thus Identified, a measure of similarity between the segment and an 
20 L-length segment of the adaptive codebook contribution, x(/). Is computed. This computation is Illustratively a cross- 
correlation. The adaptive codebook segment used for each cross-correlation begins at dpm and ends at dpm+L (see 
Figure 8), The cross-correlation Is performed with a step size equal to ssfep (should ssfep equal a non-integer value, 
conventional bandlimited interpolation of x{i) is performed in advance to provide the requisite sample values for the 
segments of x{i} and x(/)). Each cross-correlation results in a cross-correlation value {i.e., the measure of similarity). 
2S All such cross-correlations form a set of cross-correlation values separated in time by ssfep. Each cross-correlation 
value of the set is associated, therefore, with a s/7/ff corresponding to the L-length segment of x(;) used In the compu- 
tation of that value. 

[0084] Once the set of cross-correlation values is determined, the segment of the original residual signal having the 
greatest cross-correlation with the adaptive codebook segment is determined with an Increased time resolution (see 

30 step 450). Illustratively, this Is done by determining a second order polynomial curve for each set of three consecutive 
cross-correlation values (a set of three values Is distinct from Its nearest neighboring sets by one value). The middle 
value of these three cross-correlation values In a set corresponds to a shifted original residual signal as described 
above. The set of three cross-correlation values, and thus the associated polynomial curve. Is Identified by this middle 
value and its associated shift. For each such cun/e, a maximum and the location of that maximum {loc_ma)^ Is deter- 

3S mined, (If loc_max is outside the range of the three values, the three values and associated curves are disregarded.) 
The curve having the greatest maximum value identifies the shift of the original residual signal which produces the 
best match with the segment of the adaptive codebook contribution. 

[0085] The shift of the original residual signal producing the best match is refined with knowledge of the location of 
the maximum of the polynomial curve having the greatest maximum. With the location of the maximum defined with 
40 respect to the location of the middle of the three cross-correlation values associated with the curve (i.e., a value of 
shift}, shift may be refined as 

shift = shift + sstep * loc_max. 

4S 

[0086] At this point, the best shift of the original residual signal has been determined. This shift may then be used 

to extend the shifted residual, xi('/) for a duration L. Since this shift Is known, the accumulated shift between the original 
residual signal, x(/), and the shifted residual signal, i'(/) may be updated as acc_shift= acc_shift + shift{see s\ep 475). 
[0087] With the accumulated shift updated, the shifted residual signal, x{i), is extended to match acc-s/7//f with use 

so of the segment of the original residual signal corresponding to shift. Note that original residual sample values are 
available only at original signal sample times. However, in determining an optimal shift of the original residual signal, 
an upsampling has been performed prior to computing cross-correlations and a value loc_max (which is generally 
non integer) has been determined. In general this results In a noninteger sample time relationship between the shifted 
residual slgnalii(/) and the original residual signal x{l) to be used In extending the shifted residual signal. Therefore, 

55 bandlimited interpolation of the L-length segment of the original signal Is used to provide sample values of the original 
signal which are time-aligned with samples of the shifted residual. Once such time-alignment is performed, the samples 
of this time-aligned signal may be concatenated with the existing shifted residual signal {see step 480). 
[0088] Note that flow of control may have jumped to step 480 without updating the accumulated shift. In this case. 
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a length of L-samples of the original signal is interpolated to provide samples for the shifted residual with the same 

value of acc_shift as the previous shifted residual segment. 

[0089] In either case, dpm Is updated to reflect the extension o1x(i) (see step 490). 

[0090] As shown in Figure 9, once dpm is updated, the flow of control returns to step 305. As mentioned above, step 
5 305 determines whether further processing Is required to extend the shifted residual beyond the end of the current 
subf rame. If so, control flows through the process presented in steps 31 0-490 of Figure 9 again so that further extension 
of the shifted residual may be performed. Steps 31 0-490 are repeated as long as the condition of step 305 is satisfied. 
Once the shifted residual has been extended up to or beyond the end of the current adaptive codebook subf rame, the 
pointer to the end of the adaptive codebook subframe is updated (see step 500) and processing associated with time- 
10 shifting the original residual ends. 

[0091 ] Oncei (/} is determ ined by time sh ift processor 200, a scale factor A,(;) is determined by process 21 0 as follows: 



= Miim , (13) 

viherex<(i) and x(;) are signals of length equal to a subframe. This scale factor is multiplied by x{i) and provided as 
output from processor 200. 

[0092] Referring again to Figure 2,x(i) and adaptive codebook estimate X.(/)x(/) are supplied to circuit 160 which 
subtracts estimate X{i)x{i) from modified originalxi(/). The result is excitation residual signal r(/) which is supplied to a 
fixed stochastic codebook search processor 170 

[0093] Codebook search processor 1 70 operates conventionally to determine which of the fixed stochastic codebook 
vectors, z(/), scaled by a factor, most closely matches lii) in a least squares, perceptually weighted sense. The 
chosen scaled fixed codebook vector, ii{\)z„i„{lj, Is added to the scaled adaptive codebook vector, X(i]x(i), to yield the 
best estimate of a current reconstructed speech signal, x(i). This best estimate, x(/),is stored by the adaptive codebook 
processor 1 50 in its memory. 

[0094] As Is the case with conventional speech coders, adaptive codebook delay and scale factor values, X and M, 
a FSCB Index, IpQ, and gain, and linear prediction coefficients, a„, are communicated across a channel for recon- 
struction by a conventional CELP decoder/receiver (see Figure 13). This communication Is In the form of a signal 
reflecting these parameters. Because of the reduced error (In the coding process) afforded by operation of the Illustrative 
embodiment of the present Invention, It Is possible to transmit adaptive codebook delay Information, M, once per frame, 
rather than once per subframe. Subframe values for delay may be provided at the receiver by interpolating the delay 
values in a fashion identical to that done by delay estimator 140 of the transmitter. 

[0095] By transmitting adaptive codebook delay information Mevery frame rather than every subframe, the bandwidth 
requirements associated with delay may be significantly reduced. 

[0096] As discussed above with reference to step 475 of Figure 9, acc_shift represents an accumulated shift over 
time between the original signal, x(i), and the shifted signal, ii(i). In order to prevent an ever increasing asynchrony 
between these signals, the delay estimator 140 can adjust computed values for M over time. An adjustment process 
suitable for this purpose carried out by estimator 1 40 is advantageously described with reference to Figure 1 2. 
[0097] Figure 12 presents a finite-state machine having states A, B and C. The state of this machine represents an 
amount of adjustment to computed values for Mto prevent ever Increasing asynchrony. Transitions between states 
are based on values for acc_shift provided by time shift processor 200. When the machine Is In state A, the delay value 
M(FS^i) used to determine values for delays m„(/t) Is not adjusted. When in state B, the machine adjusts M(FB^-i) 
as follows: M(FS^i ) = /W(F0^i )+5 , where 5 illustratively equals one sample time. When in state C, the machine adjusts 
M(Fe^i) as follows: 



M(Fe„^^)=M(FB„^^)-5. 

so 

[0098] Given an initial state (A, B, or C), the finite state machine operates by keeping track of values of acc_shift. If 
the value of acc_shift Is such that a condition for transitioning between the current state and another state is met, a 
transition to the other state occurs. For example, assuming the machine is in state A (an illustrative initial state for 
estimator 140) and - 3ms<acc_shift<3ms, the machine would remain In state A and /W(F6n+i) would not be modified. 
55 If the value of acc_sMf exceeds 3ms, the machine transitions to state C and M{FB^_^f) is incremented by one sample 
time to help offset the asynchrony indicated by acc_shift. If, on the other hand, when in state A acc_shift becomes less 
than -3ms, the machine transitions to state B and M{FB^f ) is decremented by one sample to help offset the asynchrony 
The operation is similar for states B and C. 
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An Alternative Illustrative Embodiment 



[0099] One alternative to tfie illustrative embodiment presented in Figure 2 is presented in Figure 11 . In tliis embod- 
iment, a trial signal generator 61 0 receives an original digital speech signal, x(i), and generates a plurality of trial original 

5 signals, it (/). The trial original signal generator 610 comprises a time-shift processor, similarto that presented in Figures 
2,7, and 9, but which does not perform a correlation between a trial original signal and an adaptive codebook contri- 
bution. Rather, this time shift processor simply provides a plurality of L-length trial original signals based on a plurality 
of shifts of original speech signal x{i). As discussed above with reference to Figure 10, these trial original signals are 
L-length segments of the original signal determined by shifts of step size sstep over a range of ±limit with respect to 

10 an /.-length segment beginning at sample s f start and ending at sample s fend. Because it performs no cross-correlation 
between the original residual and trial original signals, generator 610 does not select a trial original signal for coding 
on its own. Rather it provides the trial original signals, x{i) it generates to a coder/synthesizer 620 for processing. 
[01 00] Coder/synthesizer 620 comprises a conventional analysis-by-synthesis coder, such as the conventional CELP 
coder presented In Figure 1. The synthesized (or reconstructed) original signal, x(/), is that shown in Figure 1 as the 

TS sum of the adaptive and fixed codebook output signals, e(/)+X(/)jr(/-Qf(/))(see circuit 45 of Figure 1). The coded signal 
parameters determined by the analysis processing of the CELP coder (from which the synthesized signal x(/) is gen- 
erated) may be saved in RAM for later use. The output of the coder/synthesizer 620 x{i), is thus an estimate of the 
original signal, x(l), based on a given trial original signal, xi (/). This estimate of the original signal is thereafter compared 
with the trial original signal to determine a measure of the similarity between the estimated original, x(l}, and the trial 

20 original, ^(/^ This measure similarity is provided to a subtraction circuit 630, which determines a difference (or error) 
signal, E{i), between the two signals. The error signal £(/) is provided to the trial signal generator 610 which keeps 
track of the error associated with a given trial original signal. Once all trial original signals have been processed In this 
way the trial signal generator may determine which trial signal, i (/), produced the best measure of similarity {e.g., the 
smallest error). Thereafter, generator 610 may signal the coder/synthesizer 620 to use the saved code parameters 

2S associated with the trial original signal having the smallest error These parameters may be communicated to a receiver 
as a coded representation of the original signal, x(;). 

[0101] It will be understood by those of ordinary skill in the art that reference to signals such as the "original" signal, 
"reconstructed" signal, etc., may Include reference to segments thereof. Moreover, whether a given signal is upsampled 
or not does not change its character as an "original" signal, a "trial original" signal, etc. Hence, use of the term "samples" 
30 with reference to, e.g., an "original signal" may include those sample values of the signal provided by an upsampling 
technique (such as conventional bandlimited interpolation), those samples which are not the result of upsampling, or 
both. 

Introduction to Appendix 

3S 

[0102] Attached as an appendix hereto is an Illustrative set of software programs related to the first illustrative em- 
bodiment discussed above. The software programs of this set are written in the "C" programming language. An em- 
bodiment of this invention may be provided by executing these programs on a general purpose computer, for example, 
the Iris Indigo work station marketed by Silicon Graphics, Inc. Note that subroutines "cshiftframe" and "modifyorig" 
40 correspond generally to those functions presented in Figure 9. 



4S 



so 



55 
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mod - modify residual 



oid mod ( 



float *residualm; 

float *accshift; 

float *d_shift; 

float shiftr; 

float *exctation; 

float *residual; 
int dpi,- 

float *lpcw; 
int Ipcorder; 
float delay; 
int subframel; 
int extra; 

I 

void cshif tf rame ( ) ; 
void modifyorigO ; 
float shiftr2; 
int sf start, sfend; 



;idualm, accshift, dshift, shiftr, exctation, residu 
dpi, dpm, Ipcw, Ipcorder, delay, subframel, e 



1, 



/* output: modified residual signal */ 

/* output; shift from mresidual to residual : 

I* output: local shift for all samples */ 

/* input: maximum shift range */ 

/* input: adaptive codebook excitation */ 

/* input: original residual */ 

/* input: pointer to output signals */ 

/* in/out: pointer to end of residuaim */ 

/* input: weigted Ipc coefficients */ 

/* input: Ipc order */ 

/* input: delay */ 

/* input: subframe length */ 

/* input: additional exctation constructed ' 



while ( *dpm < dpl+subf ramel) ( 

cshiftframe( &sf start, &sfend, 4shiftr2, "dpm, residual, dpi, 

*accshift, shiftr, delay, subframel, extra, fcnt); 
modifyorig( residuaim, accshift, d_shift, dpm, 3hiftr2, exctation, 
residual, sfstart, sfend) ; 

) 



40 



4S 



SO 



55 
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1992 cshiftfr. 



: *sfs 
: *s£end 

: dpm; 

: dpi; 
lat acc: 

lat del. 



ftframe( sfstart, sfend, maxshift2, dpm, residual, dpi, 
maxshift, delay, subframei, extra, fcnt) 
/* output: shift-frame start */ 
/* output: shift-frame ending */ 
shift2,- /» output: one-sided shift range */ 

/* output: up to where residualm exists ' 
idual; /* input : original residual signal •/ 

/* input : output signal pointer */ 
hi ft; /* input : shift of output versus input ' 

hift; /* input : maximum shift range ' / 

y; /• input : local pitch value «/ 

mel; /» input : subframe length •/ 

I* input : additional excitation beyond c 
/* input : frame counter (DEBUG) */ 



rrent frme */ 



eLocO ; 



■ determine locatii 



of max 



: offset; 

. iacshift; 

. length; 

- Loc, loc2; 

: delay < 0) ( 
acshift = -accshift - 
acshift = -iacshift; 



iacshift = -accshift + 0.5; 

/* determine first a pitch pulse somewhere near dpm */ 
length = 1.5 * delay; 

offset = dpm + iacshift - 0.25 " delay; 

maxeloc( iloc, Smaxener, residual, offset, length, 2); 

loc -= iacshift; 

printf ("cshiftframe; firstloc %d ", loc - dpi); 

/* now find the first pitch pulse for sure */ 
if ( loc < dpm) 1 

offset = loc + iacshift + 0.75 * delay +0.5; 

length = 0.5 * delay; 

maxeloc( Sloe, imaxener, residual, offset, length, 21 

loc — iacshift; 

printf(" Aloe %d", loc - dpi); 

\ 

if( loc > dpm+delay) { 

offset = loc + iacshift - 1.25 * delay + 0.5; 
length = 0.5 * delay; 

maxeloc( sloc2, imaxener, residual, offset, length, : 

loc2 -= iacshift; 

if ( loc2 >= dpm) loc = loc2; 

printf(" Bloc %d", loc - dpi); 

) 
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*sfstart = dpir,; 
*sfend = loc + extra; 
*maxshift2 = maxshift; 

if( *sfend > dpi + subCramel + extra) 
*sfend = dpi + subframel + extra; 

if( loc >= dpi + 3ubframel + extra/2) 
*sfend = dpi + subframel; 

if ( loc >= *sfend II loc < *sf start) 
*maxshift2 - 0; 

print£(" loc is: %d\n", loc-dpl) ; 
/* debugging pictures */ 



char titlelUOO]; 
^° static float wl{200J, w2(200]; 

for{ 1=0; i<200; wl[i] = 0.0; 

wl (loc-dpl-1) =50.0;wi(loc-dpl+l]- 50.0; wl (loc-dpl] =100; 
25 for( i=0; i< subf ramel+extra ; i++) w2[i] = residual (dpl+iacshift+i] ; 

for( i=0; i<*sf start-dpl; i++) w2(i] = 0.0; 
for( i=*sfend-dpl; Ksubf ramel+extra; i++) w2(i] = 0.0; 
sprintf (titlel, "shiftrange %5.3f", *maxshift2); 

pictures3( residual+dpl+iacshif t, subf ramel+extra, wl, subf ramel+extra, 
w2, subf ramel+extra, fcnt, titlel, "considered", "shifted"); 

30 J 



35 



40 



45 



SO 
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ixeloc 


. c Page 1 




# include "macro.h" 








void maxeloc( maxloc, 


, maxei 


ner, signal, dp, length, 


ewl) 


int 'maxloc ; 


/* 


output: location of mai 


timuir, energy */ 


float *inaxener; /* oi 


itput ; 


energy at loc */ 




float *signal; 


/* 


input; signal for whic 


:h energy is to be 


int dp; 


/* 


input : data pointer ir 




int length; 


/* 


input: window of data 


*/ 


int ewl; 


/* 


input: half length of 


energy window */ 



float ener, 
registi 



il, front; 



ener = 0.0; 

front = dp + ewl; 

tail = dp - ewl; 

for( i=tail; i<=front; i++) 

ener += signal (i] * signal [i]; 
•maxloc = dp; 
*maxener = ener; 
for( i=l; i<length; i++) { 
front++; 

ener += signal ( front ] * signal ( front ] 
tail++; 

if ( *maxener < ener) { 
♦maxloc = i + dp; 
♦maxener = ener; 



signal [tail] * signal [tail] ; 



] 
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♦include "macro. h" 
/' 

* modifyorig - modify original 
*/ 

void modifyorig ( residualm, accshift, d_shift, dpm, 



exctat 


ion, residual, dpi, sfend) 


float *residualra; 




modified residual signal */ 


float *accshift; 


/* in/out: 


accumulated shift •/ 


float *d shift; 


/• output: 


local shift value •/ 


int *dpm; 


/* output: 


first nonvalid sample of residualm 


float shiftrange; 




one side of shift range */ 


float *exctation; 


/« input : 


excitation waveform */ 


float *residual; 




original residual signal */ 


int dpi; 


/• input : 






/* input : 


window end */ 



void bl_intrp 0 ; 
void getcrit ( ) ; 
void testi_ubound() ; 

float criterion, best; 
. float shift; 
float optshift; 
float locmax; 

int leftlimit, rightlimit; 
int length; 
♦define MAXDIM 100 

float crit [MAXDIM] ; 
float a, b; 
float sstep; 

length = sfend - dpi; 

/* first we upsample by a factor 2 */ 
sstep = 0.5; 

rightlimit = shif trange/sstep + 0.5; 
Leftlimit = -rightlimit; 

if( leftlimit == rightlimit) rightlimit = leftlimit - 1; 
printf ("modifyorig: llim %d rlim %d", leftlimit, rightlimit); 
testi_ubound( right limit*2+l, MAXDIM, "modifyorig. cl") ; 
for (k=left limit; k<=rightlimit ; k++) ( 
shift = *accshift + k * sstep; 

getcrit ( crit+k- leftlimit, residual+dpl, exctation+dpl, shift, 

} 

/* then we interpolate the criterion */ 

best = 0.0; 

optshift = *accshift; 

for (k=leftlimit+l; k<rightlimit ; k++) ( 
shift = *accshift + k * sstep; 

a - crit[k-leftlimit+l] + crit [k-leftlimit-1] - 2j 
criterion = -2.0; 
if ( a !- 0.0) ( 

b = crit[k-leftlimit+l] - crit [k-le£tlimit-l 1 ; 

locmax = - b / (2.0 * a) ; 

if{ locmax <= 0.5 SS locmax >= -0.5) 



:it[k-lef tlimit] ; 
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crxterion = a * Locmax * locmax + b * iccinax + 2.0 * cr rt ( k-le f 1 1 imit ] ; 

if( criterion > best) I 

optshift = shift + sstep * locmax; 
best = criterion; 



*accshift = optshift; 

printfC" optshift %5.2f best %.4e\n", optshift, best) ; 
if ( best<-l.Q) 

for (k=leftlimit + l; k.< right limit; k++) 

printf("k=%d %f\n", k, crit (k-lef tlimit] ) ; 
for{ k=0; k<length; k++) { 

bl_intrp( residualm+dpl+k, residual+dpl+k, *accshift, 0.9, 8); 
d_shift [dpl+k] = «accshift; 

) 

*dpni » dpl + length; 
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* bl_intrp - band-limited interpolation 
*/ 

void bl_intrp( output, input, delay, factor, fl) 

float *output; /* output: interpolated output value */ 

float *input; /* input : array to be interpolated */ 

float delay; /* input : delay where actual input is */ 

float factor; /* input : cut-off frequency (relative to fs*/ 

int fl; /* input : filter length is 2*fl+l */ 

{ 

/* NOTES 

* computes "input" signal value 

* at "delay" prior to the array pointer "input" into the "input" ar; 
*/ 

register int n; 
register float t; 
register float *f; 
register float argl, arg3; 
register float denom; 
int offset; 

if ( delay < 0) { 

offset = -delay + 0.5; 
offset = -offset; 

) 

else 

offset = delay + 0.5; 
t = offset - delay; 

f = input - offset; /* center sum around f */ 

denom = 2.0 / (2.0 * fl + 1.0); 

♦output = 0.0; 

foc{ n= -fl; n<=fl; n++) { 

argl = PI * factor * (t-n) ; 

arg3 = PI * (t-n) ; 

i£( argl < l.e-2 &S. argl > -l.e-2)/* just copy */ 

♦output += factor * Mf + n); 
else /* sine function multiplied by hamming window */ 

♦output += factor * (0.54 + 0.46 * cos ( arg3 * denom )) * 
*(f+n) * sin( argl) / argl; 

) 



so 



55 
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* testi ubound - test if argument a exceeds int boundary b and pr; 
*/ 

void testi_ubound( a, b, text) 

int a; /- input: value to be tested */ 

int b; /* input: boundary value */ 

char *text; /* input: program name */ 

{ 

if ( a > b) ! 

printf ("\n%s-f-value exceeds range %d > %d\n", text, a, b) ; 

exit(lO) ; 



* testi_bound - test it argument a exceeds range bl,b2 and print text 
*/ 

void testi_bound( a, hi, b2, text) 

int a; /* input: value to be tested */ 

int bl,b2; /« input: boundary values */ 

char *text; /• input: program name */ 

if ( a < bl ) i 

printf ("\n%s-f -value exceeds range %d < %d\n", text, a, bl) ; 
exit (10) ; 

) 

else if (a > b2 ) ( 

printf ("\n%s-f-value exceeds range %d > %d\n", text, a, b2) ; 
exit (10) ; 



* testf_bound - test if argument a exceeds range bl,b2 and print t< 
*/ 

void testf_bound( a, bl, b2, text) 

float a; /• input: value to be tested */ 

float bl,b2; /• input: boundary values */ 

char *text; /• input: program name */ 

( 

if ( a < bl ) 1 

printf ("\n%s-f-value exceeds range %f < %f\n", text, a, bl) ; 
exit (10) ; 

) 

else if (a > b2 ) ( 

printf ("\n%s-f-value exceeds range %f > %f\n", text, a, b2) ; 
exit (10) ; 



* testd_bound - test if argument a exceeds range bl,b2 and print text 
*/ 

void testdboundC a, bl, b2, text) 

double a; /* input: value to be tested */ 

double bl,b2; /* input: boundary values */ 

char *text; /* input: program name */ 

{ 

if ( a < bl M 
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se if (a > b2 ) { 
printf ("\n%s-f-value exceeds range %f > %f\n", text, a, b2); 
exit (10) ; 



Nov 16 09:07 1992 getcrit.c Page I 



* getcrit - compute error between excitation and shifted residual 
*/ 

void getcrit ( criterion, residual, exctation, shift, length) 

float *criterion; /* output; error criterion */ 

float *residual; /* input : residual signal */ 

float *exctation; /* input : reference signal */ 

float shift; /* input : shift */ 

int length; /* input : vector length */ 



{ 



^oid bl_intrp() ; 
float output; 
register int i; 

•criterion = 0.0; 

for( 1=0; Klength; i++) { 

bl_intrp( toutput, residual+i, shift, 0.9, 8); 

•criterion += output * exctation ( i] ; 

■ 1 



Claims 

1. A method for analysis-by-synthesis coding of an original speech signal, the method comprising the steps of: 

a. identifying (315-375) one or more samples of the original signal based on a sample identification criterion; 

b. selecting (3B0-400) a segment of the original signal to form a trial original signal, the segment including one 
or more of the identified samples; 

c. for each of a plurality of trial original signals, evaluating a measure of similarity between the trial original 
signal and a synthesized signal by forming a cross-correlation between the trial original signal and the syn- 
thesized signal (425); 

d. determining (450) a trial original signal for use in coding based on one or more evaluated measures of 
similarity; and 

e. generating a signal reflecting a coded representation of the original signal, the signal generation based on 
one or more determined trial original signals. 

2. The method of claim 1 further comprising the steps of: 
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1 . analyzing one or more trial original signals to produce one or more parameters representative thereof; and 

2. synthesizing a signal which estimates the original signal, the synthesis based on one or more of the param- 
eters. 

5 3. The method of claim 1 wherein the step of identifying one or more samples of the original signal comprises analyzing 
the original signal to locate a local energy maximum. 

4. The method of claim 1 wherein the selected segment of the original signal comprises original signal samples other 
than the identified signal samples. 

10 

5. The method of claim 4 wherein the selected segment comprises identified samples preceding one or more other 
original signal samples. 

6. The method of claim 1 wherein the step of selecting a segment comprises: 

15 

1. determining a time shift with reference to one or more samples of the original signal; and 

2. determining a set of original signal samples based on the time shift. 

7. The method of claim 1 wherein the step of determining a trial original signal for use in coding comprises the step 
20 of selecting a trial original signal from among the plurality of trial original signals, the selection of the trial original 

signal based upon a comparison of evaluated measures of similarity 

8. The method of claim 1 wherein the step of determining a trial original signal for use in coding comprises the step 
of generating a trial original signal based on evaluated measures of similarity 

25 

9. The method of claim 8 wherein the step of generating a trial original signal comprises: 

1. determining a substantially maximum measure of similarity from among a plurality of trial original signal 
similarity measures; and 

30 2. determining a time-shift reflecting the substantial maximum measure of similarity. 

10. The method of claim 9, wherein the step of generating atrial original signal further comprises determining sample 
values for the trial original signal based on a formed trial original signal and the time-shift. 

35 11. The method of claim 9 wherein the step of generating a trial original signal further comprises determining sample 
values for the trial original signal based on the original signal and the time-shift. 

12. The method of claim 1 wherein the step of generating a signal reflecting a coded representation of the original 
signal comprises coding one or more determined trial original signals. 

40 

13. The method of claim 1 2 wherein the step of coding one or more trial original signals comprises performing analysis- 
by-synthesis coding. 

14. The method of claim 13 wherein the step of performing analysis-by-synthesis coding comprises performing code- 
cs excited linear prediction coding. 

15. An apparatus for analysis-by-synthesis coding an original speech signal, the apparatus comprising: 

a. means (140,150) for identifying one or more samples of the original signal based on a sample identification 

so criterion; 

b. means (150) for selecting a segment of the original signal to form atrial original signal, the segment Including 
one or more of the identified samples; 

c. means (200) for evaluating a measure of similarity by forming a cross-correlation between each of a plurality 
of trial original signals and a synthesized signal; 

55 d. means (200) for determining a trial original signal for use in coding based on one or more evaluated measures 

of similarity; and 

e. means (160,170) for generating a signal reflecting a coded representation of the original signal, the signal 
generation based on one or more determined trial original signals. 
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16. The apparatus of claim 15 further comprising: 

1 . means for analyzing one or more trial original signals to produce one or more parameters representative 
thereof; and 

2. means for synthesizing a signal which estimates the original signal, the synthesis based on one or more of 
the parameters. 

17. The apparatus of claim 15 wherein the means for Identifying one or more samples of the original signal comprises 

a means for analyzing the original signal to locate a local energy maximum. 

18. The apparatus of claim 15 wherein the means for selecting a segment comprises: 

1 . means for determining a time shift with reference to one or more samples of the original signal; and 

2. means for determining a set of original signal samples based on the time shift. 

19. The apparatus of claim 15 wherein the means for generating a signal reflecting a coded representation of the 
original signal comprises means for coding one or more determined trial original signals. 

20. The apparatus of claim 1 9 wherein the means for coding one or more trial original signals comprises means for 
performing analysls-by-synthesis coding. 

21. The apparatus of claim 20 wherein the means for performing analysls-by-synthesis coding comprises means for 
performing code-excited linear prediction coding. 

Patentanspruche 

1. Verfahren zur Analyse-durch-Synthese-Codierung eines Origlnal-Sprachslgnals, mit den folgenden Schrltten: 

a. Identifizleren (315-375) eines Oder mehrerer Abtastwerte des Originalsignals auf der Grundlage eInes 
Abtastwertldentlflzierungskriterlums; 

b. Auswahlen (380-400) eines Segments des Originalsignals zur Bildung eines Versuchs-Originalslgnals, wo- 
bei das Segment einen oder mehrere der Identlflzierten Abtastwerte enthalt; 

c. fur jedes von mehreren Versuchs-Originalsignalen, Auswerten eines Ma3es der Ahnllchkeit zwischen dem 
Versuchs-Orlglnalslgnal und einem synthetlsierten Signal duroh Bllden einer Kreuzkorrelation zwischen dem 
Versuchs-Orlglnalslgnal und dem synthetlsierten Signal (425); 

d. Bestlmmen (450) eines Versuchs-Originalslgnals zur Verwendung bel der Codlerung auf der Grundlage 
eines oder mehrerer ausgewerteter Made der Ahnllchkeit; und 

e. Erzeugen eines Signals, das eine codlerte Darstellung des Originalsignals widersplegelt, wobel die Signal- 
erzeugung auf der Grundlage eines oder mehrerer bestimmter Versuchs-Originalsignale erfolgt. 

2. Verfahren nach Anspruch 1, weiterhin mit den folgenden Schrltten: 

1 . Analysieren eines oder mehrerer Versuchs-Originalsignale zur Erzeugung eines oder mehrerer diese dar- 
stellender Parameter; und 

2. Synthetisleren eines Signals, das das Origlnalsignal abschatzt, wobel die Synthese auf der Grundlage eines 

Oder mehrerer der Parameter erfolgt. 

3. Verfahren nach Anspruch 1 , wobel der Schritt des Identifizierens eines oder mehrerer Abtastwerte des Original- 
signals das Analysieren des Originalsignals zur Auffindung eines lokalen Energlemaximums umfaBt. 

4. Vertahren nach Anspruch 1 , wobel das ausgewahlte Segment des Originalsignals von den identlflzierten Signal- 
abtastwerten verschiedene Origlnalsignalabtastwerte umfaBt. 

5. Verfahren nach Anspruch 4, wobel das ausgewahlte Segment Identlflzierte Abtastwerte umfaBt, die einem oder 
mehreren weiteren Orlginalsignalabtastwerten vorausgehen. 

6. Verfahren nach Anspruch 1 , wobel der Schritt des Auswahlens eines Segments folgendes umfaBt: 



24 



EP 0 602 826 B1 



1 . Bestimmen einer Zeitverschiebung mit Bezug auf einen Oder mehrere Abtastwerte des Originalsignals; und 

2. Bestimmen einer IVIenge von Originalsignalabtastwerten auf der Grundlage der Zeitverscliiebung. 

7. Verfahren nach Anspruch 1, wobei der Scliritt des Bestimmens eines Versuchs-Originaisignals zur Verwendung 
5 bei der Codierung den Schritt des Auswalilens eines Versuclis-Originalsignals aus den mefireren Versuchs-Ori- 

ginalsignalen umfaBt, wobei die Auswalil des Versuclis-Originalsignals aul der Grundlage eines Vergleiclis aus- 
gewerteter Ma3e der Alnnlichl<eit erfolgt. 

8. Verfalnren nacli Ansprucln 1 , wobei der Scliritt des Bestimmens eines Versucfis-Originalsignals zur Verwendung 
10 bei der Codierung den Scliritt des Erzeugens eines Versuchs-Originaisignals auf der Grundlage ausgewerteter 

Maf3e der Ahnliclikeit umfaBt. 

9. Verfahren nach Anspruch 8, wobei der Schritt des Erzeugens eines Versuchs-Originaisignals folgendes umfaBt: 

'5 1 . Bestimmen eines im wesentllchen maximalen MaBes der Ahnlichkeit aus mehreren der Versuchs-Original- 

signalahnlichkeitsmaBe; und 

2. Bestimmen einer Zeitverschiebung, die das im wesentlichen maximale MaB der Ahnlichkeit widerspiegelt. 

10. Verfahren nach Anspruch 9, wobei der Schritt des Erzeugens eines Versuchs-Originaisignals weiterhin das Be- 
20 stimmen von Abtastwerten fur das Versuchs-Originalsignal auf der Grundlage eines gebildeten Versuchs-Origi- 
naisignals und der Zeitverschiebung umfaBt. 

11. Verfahren nach Anspruch 9, wobei der Schritt des Erzeugens eines Versuchs-Originaisignals weiterhin das Be- 
stimmen von Abtastwerten fur das Versuchs-Originalsignal auf der Grundlage des Originalsignals und der Zeit- 

2S verschiebung umfaBt. 

12. Verfahren nach Anspruch 1, wobei der Schritt des Erzeugens eines Signals, das eine codierte Darstellung des 
Originalsignals widerspiegelt, das Codieren eines oder mehrerer bestimmter Versuchs-Originalslgnale umfaBt. 

30 13. Verfahren nach Anspruch 12, wobei der Schritt des Codierens eines oder mehrerer Versuchs-Originalslgnale die 
Durchfuhrung einer Analyse-durch-Synthese-Codierung umfaBt. 

14. Verfahren nach Anspruch 13, wobei der Schritt des Durchfuhrens der Analyse-durch-Synthese-Codierung die 
Durchfuhrung einer Codierung mit codeerregter linearer Pradiktion umfaBt. 

35 

15. Vorrichtung zur Analyse-durch-Synthese-Codierung eines Original-Sprachsignals, mit: 

a. einem Mittel (140, 150) zum Identifizieren eines oder mehrerer Abtastwerte des Originalsignals auf der 
Grundlage eines Abtastwertidentifizierungskriteriums; 
40 b. einem Mittel (150) zum Auswahlen eines Segments des Originalsignals zur Bildung eines Versuchs-Origi- 

naisignals, wobei das Segment einen oder mehrere der identifizierten Abtastwerte enthalt; 

c. einem Mittel (200) zum Auswerten eines MaBes der Ahnlichkeit durch Bilden einer Kreuzkorrelation zwi- 
schen jedem von mehreren Versuchs-Originalsignalen und einem synthetisierten Signal; 

d. einem Mittel (200) zum Bestimmen eines Versuchs-Originaisignals zur Venwendung bei der Codierung auf 
4S der Grundlage eines oder mehrerer ausgewerteter MaBe der Ahnlichkeit; und 

e. einem Mittel (160, 170) zum Erzeugen eines Signals, das eine codierte Darstellung des Originalsignals 
widerspiegelt, wobei die Signalerzeugung auf der Grundlage eines oder mehrerer bestimmter Versuchs-Ori- 
ginalslgnale erfolgt. 

so 16. Vorrichtung nach Anspruch 15, weiterhin mit: 

1. einem Mittel zum Analysieren eines oder mehrerer Versuchs-Originalslgnale zur Erzeugung eines oder 
mehrerer diese darstellender Parameter; und 

2. einem Mittel zum Synth et isle re n eines Signals, das das Originalslgnal abschatzt, wobei die Synthese auf 
55 der Grundlage eines oder mehrerer der Parameter erfolgt. 

1 7. Vorrichtung nach Anspruch 1 5, wobei das Mittel zum Identifizieren eines oder mehrerer Abtastwerte des Original- 
signals ein Mittel zum Analysieren des Originalsignals zum Auffinden eines lokalen Energiemaximums umfaBt. 
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18. Vorrichtung nach Anspruch 15, wobei das Mittel zum Auswahlen eines Segments folgendes umfaBt: 

1 . ein Mittel zum Bestimmen einer Zeitverschiebung mit Bezug auf einen Oder mehrere Abtastwerte des Ori- 
ginalsignals; und 

5 2. ein IVIittel zum Bestimmen einer IVIenge von Originalsignalabtastwerten auf der Grundlage der Zeitverscliie- 

bung. 

19. Vorriclitung nacli Ansprucli 15, wobei das IVIittel zum Erzeugen eines Signals, das eine codierte Darstellung des 
Originalsignals widerspiegelt, ein IVIittel zum Codieren eines oder mehrerer bestimmter \^rsuchs-Originalsignale 

10 umfaBt. 

20. Vorrichtung nach Anspruch 1 9, wobei das Mittel zum Codieren eines oder mehrerer Versuchs-Originalsignale ein 
Mittel zum Durchfuhren einer Analyse-durch-Synthese-Codierung umfaBt. 

'5 21. Vorrichtung nach Anspruch 20, wobei das Mittel zum Durchfuhren einer Analyse-durch-Synthese-Codierung ein 
Mittel zum Durchfuhren einer Codierung mit codeerregter linearer Pradiktion umfaBt. 



1 . Methode de codage avec analyse par synthese d'un signal de parole initial, la methode comprenant les etapes de: 

a. identification (31 5-375) d'un ou plusieurs echantillons du signal initial en fonction d'un critere d'identification 

d'echantillon ; 

2S b. selection (380-400) d'un segment du signal initial en vue de former un signal initial test, le segment com- 

portant un ou plusieurs des echantillons identifies ; 

c. pour chacun d'une pluralite de signaux initiaux tests, evaluation d'une mesure de similarite entre le signal 
initial test et un signal synthetise en formant une intercorrelation entre le signal initial test et le signal synthetise 
(425) ; 

30 d. determination (450) d'un signal initial test destine k 3tre utilise dans le codage en fonction d'une ou plusieurs 

mesures evaluees de similarite ; et 

e. generation d'un signal refletant une representation codee du signal initial, la generation de signal etant 
basee sur un ou plusieurs signaux initiaux tests determines. 

3S 2. Methode selon la revendication 1 , comprenant en outre les etapes de: 

1 . analyse d'un ou plusieurs signaux initiaux tests en vue de produire un ou plusieurs parametres representatifs 
de ceux-ci ; et 

2. synthetisation d'un signal qui estime le signal initial, la synthese etant basee sur un ou plusieurs des para- 
40 metres. 

3. Methode selon la revendication 1 , dans laquelle I'elape d'identification d'un ou plusieurs echantillons du signal 
initial comprend I'analyse du signal initial en vue de localiser un maximum d'energie local. 

4S 4. Methode selon la revendication 1 , dans laquelle le segment selectionne du signal initial comprend des echantillons 
de signal initial autres que les echantillons de signal identifies. 

5. Methode selon la revendication 4, dans laquelle le segment selectionne comprend des echantillons identifies pre- 
cedant un ou plusieurs autres echantillons de signal initial. 

so 

6. Methode selon la revendication 1, dans laquelle I'elape de selection d'un segment comprend: 

1. la determination d'un decalage de temps en reference a un ou plusieurs echantillons du signal initial ; et 

2. la determination d'un ensemble d'echantillons de signal initial en fonction du decalage de temps. 

55 

7. Methode selon la revendication 1 , dans laquelle I'etape de determination d'un signal initial test destine a etre utilise 
dans le codage comprend I'etape de selection d'un signal initial test parmi la pluralite de signaux initiaux tests, la 
selection du signal initial test etant basee sur une comparaison de mesures evaluees de similarite. 
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8. Methode selon la revendication 1 , dans laquelle I'etape de determination d'un signal initial test destine a etre utilise 
dans le codage comprend I'etape de generation d'un signal initial test en fonction de mesu res evaluees de simllarlte. 

9. Methode selon la revendication 8, dans laquelle I'etape de generation d'un signal initial test comprend: 

5 

1 . la determination d'une mesure substantlellement maximum de simllarlte parmi une plurallte de mesures de 
simllarlte du signal Initial test ; et 

2. la determination d'un decalage de temps refletant la mesure substantielle maximum de simllarlte. 

10 10. Methode selon la revendication 9, dans laquelle I'etape de generation d'un signal initial test comprend en outre la 
determination de valeurs d'echantillons du signal initial test en fonction d'un signal initial test forme et du decalage 
de temps. 

11. Methode selon la revendication 9, dans laquelle I'etape de generation d'un signal Initial test comprend en outre la 
'5 determination de valeurs d'echantillons du signal initial test en fonction du signal initial et du decalage de temps. 

12. Methode selon la revendication 1, dans laquelle I'etape de generation d'un signal refl6tant une representation 
codee du signal initial comprend le codage d'un ou plusieurs signaux initiaux tests determines. 

20 13. Methode selon la revendication 12, dans laquelle I'etape de codage d'un ou plusieurs signaux initiaux tests com- 
prend I'sxecution d'un codage avec analyse par synthese. 

1 4. Methode selon la revendication 1 3, dans laquelle I'etape d'execution du codage avec analyse par synthese com- 
prend I'execution d'un codage predictif lineaire a code excite. 

2S 

15. Appareil de codage avec analyse par synthese d'un signal de parole initial, I'appareil comprenant: 

a. un moyen (140, 150) pour identifier un ou plusieurs echantillons du signal initial en fonction d'un critere 
d'Identiflcation d'echantillon ; 

30 b. un moyen (150) pour s6lectionner un segment du signal initial en vue de former un signal initial test, le 

segment comportant un ou plusieurs des echantillons identifies ; 

c. un moyen (200) pour evaluer une mesure de similarite en formant une intercorrelation entre chacun d'une 
plurallte de signaux initiaux tests et un signal synthetise ; 

d. un moyen (200) pour determiner un signal initial test destine a etre utilise dans le codage en fonction d'une 
3S ou plusieurs mesures evaluees de similarite ; et 

e. un moyen (160, 170) pour genererun signal refletant une representation codee du signal initial, la generation 
de signal etant basee sur un ou plusieurs signaux initiaux tests determines. 

16. Appareil selon la revendication 15, comprenant en outre: 

40 

1 . un moyen pour analyser un ou plusieurs signaux initiaux tests en vue de produire un ou plusieurs parametres 
representatifs de ceux-ci ; et 

2. un moyen pour synthetiser un signal qui estime le signal initial, la synthese etant basee sur un ou plusieurs 
des parametres. 

4S 

17. Appareil selon la revendication 15, dans lequel le moyen d'identification d'un ou plusieurs echantillons du signal 
initial comprend un moyen pour analyser le signal initial en vue de localiser un maximum d'energie local. 

18. Appareil selon la revendication 15, dans lequel le moyen de selection d'un segment comprend: 

so 

1. un moyen pour determiner un decalage de temps en reference a un ou plusieurs echantillons du signal 
initial ; et 

2. un moyen pour determiner un ensemble d'echantillons de signal initial en fonction du decalage de temps. 

55 19. Appareil selon la revendication 1 5, dans lequel le moyen pour generer un signal refl6tant une representation cod6e 
du signal initial comprend un moyen pour coder un ou plusieurs signaux initiaux tests determines. 

20. Appareil selon la revendication 19, dans lequel le moyen de codage d'un ou plusieurs signaux initiaux tests com- 
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prend un moyen pour executer un codage avec analyse par synthese. 

21. Apparsil selon la revendication 20, dans lequel le moyen d'execution du codage avec analyse par synthese com- 
prend un moyen pour executer un codage predictif lineaire a code excite. 
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